Ground Truth for Layout Analysis Performance Evaluation

نویسندگان

  • Apostolos Antonacopoulos
  • Dimosthenis Karatzas
  • David Bridson
چکیده

Over the past two decades a significant number of layout analysis (page segmentation and region classification) approaches have been proposed in the literature. Each approach has been devised for and/or evaluated using (usually small) application-specific datasets. While the need for objective performance evaluation of layout analysis algorithms is evident, there does not exist a suitable dataset with ground truth that reflects the realities of everyday documents (widely varying layouts, complex entities, colour, noise etc.). The most significant impediment is the creation of accurate and flexible (in representation) ground truth, a task that is costly and must be carefully designed. This paper discusses the issues related to the design, representation and creation of ground truth in the context of a realistic dataset developed by the authors. The effectiveness of the ground truth discussed in this paper has been successfully shown in its use for two international page segmentation competitions (ICDAR2003 and ICDAR2005).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Ground-Truth Generation for Skew-Tolerance Evaluation of Document Layout Analysis Methods

Generation of ground-truths is of great importance for unbiased performance evaluation of document layout analysis methods. This is especially necessary because many methods are claimed to be skew-tolerant. However, experimental evaluation of this fact is often based only on human subjective judgement and restricted to a few experiments. The main obstacle for obtaining human-independent and mor...

متن کامل

Performance Evaluation of Document Structure Extraction Algorithms

This paper presents a performance metric for the document structure extraction algorithms by finding the correspondences between detected entities and ground truth. We describe a method for determining an algorithm’s optimal tuning parameters. We evaluate a group of document layout analysis algorithms on 1600 images from the UW-III Document Image Database, and the quantitative performance measu...

متن کامل

Methodology for Flexible and Efficient Analysis of the Performance of Page Segmentation Algorithms

This paper presents part of a new DIA performance analysis framework aimed at Layout Analysis algorithm developers. A new region-representation scheme (an interval-based description of isothetic polygons) and a corresponding comparison approach are introduced. These enable fast and accurate geometric comparison of ground-truth with results of page segmentation, improving on current evaluation m...

متن کامل

Fast and Accurate Ground Truth Generation for Skew-Tolerance Evaluation of Page Segmentation Algorithms

Many image segmentation algorithms are known, but often there is an inherent obstacle in the unbiased evaluation of segmentation quality: the absence or lack of a common objective representation for segmentation results. Such a representation, known as the ground truth, is a description of what one should obtain as the result of ideal segmentation, independently of the segmentation algorithm us...

متن کامل

Document Layout Structure Extraction Using Bounding Boxes of Diierent Entities

This paper presents an eecient and accurate technique for document page layout structure extraction and classiication by analyzing the spatial connguration of the bounding boxes of diierent entities on a given image. The text, table, and nontext structures are detected on document images. The text-lines and words are extracted and the tabular structure is further decomposed into row and column ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006